Towards clustering-based word sense discrimination
نویسندگان
چکیده
This paper describes a series of experiments conducted to group similar words using context features derived from a corpus. The goal is to find an approach that would be suitable for cleaning the fuzzy WordNet synsets obtained by automatic translation of Serbian synsets into Slovene. Similar techniques have been used successfully by a number of researches already and they are attractive particularly because they are knowledge-lean and based on evidence found in simple raw text. A selection of features and settings are tested on sample test sets with an unsupervised machine learning method called hierarchical clustering. In the final part of the paper, the obtained results are analyzed and the optimal set of features is selected, followed by a discussion of the results and some further
منابع مشابه
Word sense discrimination in information retrieval: A spectral clustering-based approach
Word sense ambiguity has been identified as a cause of poor precision in information retrieval (IR) systems. Word sense disambiguation and discrimination methods have been defined to help systems choose which documents should be retrieved in relation to an ambiguous query. However, the only approaches that show a genuine benefit for word sense discrimination or disambiguation in IR are generall...
متن کاملTowards Dynamic Word Sense Discrimination with Random Indexing
Most distributional models of word similarity represent a word type by a single vector of contextual features, even though, words commonly have more than one sense. The multiple senses can be captured by employing several vectors per word in a multi-prototype distributional model, prototypes that can be obtained by first constructing all the context vectors for the word and then clustering simi...
متن کاملWord Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces
This paper systematically compares unsupervised word sense discrimination techniques that cluster instances of a target word that occur in raw text using both vector and similarity spaces. The context of each instance is represented as a vector in a high dimensional feature space. Discrimination is achieved by clustering these context vectors directly in vector space and also by finding pairwis...
متن کاملName Discrimination and Email Clustering using Unsupervised Clustering and Labeling of Similar Contexts
In this paper, we apply an unsupervised word sense discrimination technique based on clustering similar contexts (Purandare and Pedersen, 2004) to the problems of name discrimination and email clustering. Names of people, places, and organizations are not always unique. This can create a problem when we refer to or seek out information about such entities. When this occurs in written text, we s...
متن کاملDiscriminating Among Word Meanings by Identifying Similar Contexts
Word sense discrimination is an unsupervised clustering problem, which seeks to discover which instances of a word/s are used in the same meaning. This is done strictly based on information found in raw corpora, without using any sense tagged text or other existing knowledge sources. Our particular focus is to systematically compare the efficacy of a range of lexical features, context represent...
متن کاملSenseClusters - Finding Clusters that Represent Word Senses
SenseClusters is a freely available word sense discrimination system that takes a purely unsupervised clustering approach. It uses no knowledge other than what is available in a raw unstructured corpus, and clusters instances of a given target word based only on their mutual contextual similarities. It is a complete system that provides support for feature selection from large corpora, several ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006